28 research outputs found

    Tri-modal Person Re-identification with RGB, Depth and Thermal Features

    Get PDF

    Deep learning with self-supervision and uncertainty regularization to count fish in underwater images

    Full text link
    Effective conservation actions require effective population monitoring. However, accurately counting animals in the wild to inform conservation decision-making is difficult. Monitoring populations through image sampling has made data collection cheaper, wide-reaching and less intrusive but created a need to process and analyse this data efficiently. Counting animals from such data is challenging, particularly when densely packed in noisy images. Attempting this manually is slow and expensive, while traditional computer vision methods are limited in their generalisability. Deep learning is the state-of-the-art method for many computer vision tasks, but it has yet to be properly explored to count animals. To this end, we employ deep learning, with a density-based regression approach, to count fish in low-resolution sonar images. We introduce a large dataset of sonar videos, deployed to record wild Lebranche mullet schools (Mugil liza), with a subset of 500 labelled images. We utilise abundant unlabelled data in a self-supervised task to improve the supervised counting task. For the first time in this context, by introducing uncertainty quantification, we improve model training and provide an accompanying measure of prediction uncertainty for more informed biological decision-making. Finally, we demonstrate the generalisability of our proposed counting framework through testing it on a recent benchmark dataset of high-resolution annotated underwater images from varying habitats (DeepFish). From experiments on both contrasting datasets, we demonstrate our network outperforms the few other deep learning models implemented for solving this task. By providing an open-source framework along with training data, our study puts forth an efficient deep learning template for crowd counting aquatic animals thereby contributing effective methods to assess natural populations from the ever-increasing visual data

    Video Transformers: A Survey

    Full text link
    Transformer models have shown great success handling long-range interactions, making them a promising tool for modeling video. However they lack inductive biases and scale quadratically with input length. These limitations are further exacerbated when dealing with the high dimensionality introduced with the temporal dimension. While there are surveys analyzing the advances of Transformers for vision, none focus on an in-depth analysis of video-specific designs. In this survey we analyze main contributions and trends of works leveraging Transformers to model video. Specifically, we delve into how videos are handled as input-level first. Then, we study the architectural changes made to deal with video more efficiently, reduce redundancy, re-introduce useful inductive biases, and capture long-term temporal dynamics. In addition we provide an overview of different training regimes and explore effective self-supervised learning strategies for video. Finally, we conduct a performance comparison on the most common benchmark for Video Transformers (i.e., action classification), finding them to outperform 3D ConvNets even with less computational complexity

    MĂ©todos de reconocimiento del subsuelo

    Get PDF
    The aim of this workshop is to present the main methods of subsoil studies (namely mechanical and geophysical methods) to the Earth Sciences professorate. These methods frequently involve the use of specific material. The different methods are usually taught in the classroom where there is no real contact between the students and the equipment. Several activities, all of them taking place in surrounding areas of the university campus of Girona, will provide the assistants to the workshop with the opportunity of making measurements with different equipment. These activities will be made in the field so as to contribute to the resolution of a problem which will have been previously proposed. The problems presented are situations, most of them real, when subsoil investigation techniques are usually used. These cases have been employed as teaching-learning strategies with university and second grade students in the area of Girona. Finally, some examples of exercises involving the treatment of data obtained through subsoil investigation techniques are also presented to complement the workshopEl taller tiene como objeto principal facilitar al profesorado de Ciencias de la Tierra el contacto directo con los principales tipos de métodos que se utilizan en el estudio del subsuelo. Éstos comprenden un conjunto de equipos que generalmente son explicados en el aula sin que haya habido un contacto directo con los mismos (los sondeos mecánicos de reconocimiento y los instrumentos geofísicos son los casos más habituales). A través de diversas actividades, ubicadas en el entorno del campus universitario de Girona, los asistentes participan en la realización de las mediciones mediante equipos diversos. Estas actividades se desarrollan en campo con el objeto de contribuir a la resolución de una problemática concreta planteada. Se trata de situaciones, reales en su mayoría, en las cuales suele ser habitual la utilización de métodos de reconocimiento del subsuelo. Los casos presentados han sido utilizados como estrategias de enseñanza-aprendizaje con estudiantes universitarios y de secundaria de nuestro entorno. El taller se complementa con la presentación de varios ejemplos de una tipología de ejercicios que incluyen el tratamiento de datos obtenidos mediante métodos de reconocimiento del subsuel

    Action recognition using single-pixel time-of-flight detection

    Get PDF
    Action recognition is a challenging task that plays an important role in many robotic systems, which highly depend on visual input feeds. However, due to privacy concerns, it is important to find a method which can recognise actions without using visual feed. In this paper, we propose a concept for detecting actions while preserving the test subject's privacy. Our proposed method relies only on recording the temporal evolution of light pulses scattered back from the scene. Such data trace to record one action contains a sequence of one-dimensional arrays of voltage values acquired by a single-pixel detector at 1 GHz repetition rate. Information about both the distance to the object and its shape are embedded in the traces. We apply machine learning in the form of recurrent neural networks for data analysis and demonstrate successful action recognition. The experimental results show that our proposed method could achieve on average 96.47 % accuracy on the actions walking forward, walking backwards, sitting down, standing up and waving hand, using recurrent neural network
    corecore